Pseudo-segment based speech recognition using neural recurrent whole-word recognizers
نویسندگان
چکیده
In this paprr, we dvscribe d recurrent neural network based, isolated word speech recognizer. 'The recognizer uses 2 MLP's. A f i s t , static MLP is used for classification of frames in phonemes. Next, a time compression step is applied. The resulting pseudo-segments are then used as inputs for a second, dynamic MLP that integrates the information over time to decide the current word. We apply this approach on an isolated digit recognition task and compare the results with a hybrid MLPjHMM approach using the same static MLP.
منابع مشابه
A probabilistic framework for segment-based speech recognition
Most current speech recognizers use an observation space based on a temporal sequence of measurements extracted from fixed-length ‘‘frames’’ (e.g., Mel-cepstra). Given a hypothetical word or sub-word sequence, the acoustic likelihood computation always involves all observation frames, though the mapping between individual frames and internal recognizer states will depend on the hypothesized seg...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملSpeech Recognition Using Neural Networks
Although speech recognition products are already available in the market at present, their development is mainly based on statistical techniques which work under very specific assumptions. The work presented in this thesis investigates the feasibility of alternative approaches for solving the problem more efficiently. A speech recognizer system comprised of two distinct blocks, a Feature Extrac...
متن کاملA new hybrid system based on MMI-neural networks for the RM speech recognition task
We present a hybrid speech recognition system for speaker independent continuous speech recognition. The system combines a novel information theory based neural network (NN) paradigm and discrete Hidden Markov models (HMMs) including State-of-the-Art techniques like state clustered triphones. The novel NN type is trained by an algorithm based on principles of self-organization that achieves max...
متن کاملCombining multiple-type input units using recurrent neural network for LVCSR language modeling
In this paper, we investigate the use of a Recurrent Neural Network (RNN) in combining hybrid input types, namely word and pseudo-morpheme (PM) for Thai LVCSR language modeling. Similar to other neural network frameworks, there is no restriction on RNN input types. To exploit this advantage, the input vector of a proposed hybrid RNN language model (RNNLM) is a concatenated vector of word and PM...
متن کامل